Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training

نویسندگان

  • Tobias Cincarek
  • Tomoki Toda
  • Hiroshi Saruwatari
  • Kiyohiro Shikano
چکیده

The construction of high-performance acoustic models for certain speech recognition tasks is very costly and time-consuming, since it most often requires the collection and transcription of large amounts of task-specific speech data. In this paper acoustic modeling for spoken dialogue systems based on unsupervised selective training is examined. The main idea is to select those training utterances from an (untranscribed) speech data pool, so that the likelihood of a separate small (transcribed) development speech data set is maximized. If only the selected data are employed to retrain the initial acoustic models, a better performance is achieved than when retraining with all collected data. Using the proposed approach it is also possible to considerably reduce the costs for human-labeling of the speech data without compromising the performance. Furthermore, the method provides means for automatic task-adaptation of acoustic models, e.g. to adult or children speech. This is important, since detailed information about each automatically collected utterance is usually not available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust numeric recognition in spoken language dialogue

This paper addresses the problem of automatic numeric recognition and understanding in spoken language dialogue. We show that accurate numeric understanding in ̄uent unconstrained speech demands maintaining robustness at several di€erent levels of system design, including acoustic, language, understanding and dialogue. We describe a robust system for numeric recognition and present algorithms f...

متن کامل

Stochastic modeling of semantic content for use IN a spoken dialogue system

A key issue in a spoken dialogue system is the successful semantic interpretation of the output from the speech recognizer. Extracting the semantic concepts, i.e. the meaningful phrases, of an utterance is traditionally performed using rule based methods. In this paper we describe a statistical framework for modeling (and decoding) semantic concepts based on discrete hidden Markov models (DHMMs...

متن کامل

Unsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition

We propose an unsupervised technique to model the spoken query using hidden Markov model (HMM) for spoken term detection without speech recognition. By unsupervised segmentation, clustering and training, a set of HMMs, referred to as acoustic segment HMMs (ASHMMs), is generated from the spoken archive to model the signal variations and frame trajectories. An unsupervised technique is also desig...

متن کامل

Learning lexicons from spoken utterances based on statistical model selection

This paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning without any a priori linguistic knowledge other than a phoneme acoustic model. In order to obtain a lexicon, a statistical model of the joint probability of a spoken utterance and an object is learned based on the minimum description length principle. This model c...

متن کامل

TUKE at MediaEval 2015 QUESST

In this paper, we present our retrieving system for QUery by Example Search on Speech Task (QUESST), comprising the posteriorgram-based modeling approach along with the weighted fast sequential dynamic time warping algorithm (WFS-DTW). For this year, our main effort was directed toward developing language-dependent keyword matching system, utilizing all available information about spoken langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006